Enhancing fracture diagnosis in pelvic X-rays by deep convolutional neural network with synthesized images from 3D-CT

Pelvic fractures pose significant challenges in medical diagnosis due to the complex structure of the pelvic bones. Timely diagnosis of pelvic fractures is critical to reduce complications and mortality rates. While computed tomography (CT) is highly accurate in detecting pelvic fractures, the initial diagnostic procedure usually involves pelvic X-rays (PXR). In recent years, many deep learning-based methods have been developed utilizing ImageNet-based transfer learning for diagnosing hip and pelvic fractures. However, the ImageNet dataset contains natural RGB images which are different than PXR. In this study, we proposed a two-step transfer learning approach that improved the diagnosis of pelvic fractures in PXR images. The first step involved training a deep convolutional neural network (DCNN) using synthesized PXR images derived from 3D-CT by digitally reconstructed radiographs (DRR). In the second step, the classification layers of the DCNN were fine-tuned using acquired PXR images. The performance of the proposed method was compared with the conventional ImageNet-based transfer learning method. Experimental results demonstrated that the proposed DRR-based method, using 20 synthesized PXR images for each CT, achieved superior performance with the area under the receiver operating characteristic curves (AUROCs) of 0.9327 and 0.8014 for visible and invisible fractures, respectively. The ImageNet-based method yields AUROCs of 0.8908 and 0.7308 for visible and invisible fractures, respectively.

Insufficiency fractures are caused by repetitive stress and some are practically invisible in PXR images 4 .Similarly, fractures associated with osteoporosis are also challenging to detect in PXR images 10 .Many osteoporosis fractures are invisible in their initial stage of development without an appropriate viewing angle.Deep learning has been demonstrated to be effective in learning subtle features and patterns to assist in different disease diagnosis [11][12][13] .Hence, employing a deep learning-based assistive system could prove valuable in recognizing PXR images with visible fracture, as well as invisible fracture.
In the initial stages, fracture detection methods relied on image processing techniques and computational models like morphological operations with Hough transform 14 , neighbor-conditional shape model 15 , and relaxed digital straight-line segment (RDSS) 16 .However, these methods depended on numerous parameters, and were susceptible to subject-specific limitations.Recently, deep learning has gained popularity for detecting various fractures, such as wrist fractures 17 , rib fracture 18 , femur fracture 19 , femoral neck fracture 20 , and vertebral fractures 21 .Similarly, for hip and pelvic fracture detection, methods have been proposed utilizing deep learning.Krogue et al. 22 proposed a DenseNet-based method for detecting hip region and fracture classification from PXR images.The binary classification accuracy achieved was 93.7%, and the multi-class classification accuracy was 90.8%.Kitamura 23 also introduced a method based on DenseNet121 model, where the model was trained to create position labeling and detect hardware presence in PXR images.A separate model was used to detect different types of fractures.The area under the curve (AUC) for position and hardware detection was 0.99.The AUCs for proximal femoral fracture, pelvic fracture, and acetabular fracture were 0.95, 0.75, and 0.85, respectively.Another method proposed the use of YOLOv4-tiny deep learning model to detect 3 types of hip fractures 24 .The model's performance was also compared with that of doctors, achieving a sensitivity of 96.2%, while the performance of the doctors varied from 69.2 to 96.2%.The study concluded that the performance of the trained model was comparable to attending physicians and chief residents in orthopedics with no statistical difference, and outperformed the first-year residents and general practitioners.Cheng et al. proposed a scalable deep learning algorithm named PelviXNet for universal trauma detection on PXR images 25 .PelviXNet combined feature pyramid network (FPN) with DenseNet-169 and was trained using weakly supervised point annotated PXR images.The trained PelviXNet yielded an area under the receiver operating characteristic curve (AUROC) of 0.973 on a clinical population test set.All of the above methods discussed about fractures that are visible on PXR images.
Another challenge associated with deep learning is the significant amount of data required to effectively train a model.However, obtaining a substantial number of annotated medical images is often difficult.A common practice in this field is to utilize the transfer learning 26 technique.In transfer learning, a deep learning model is initially trained on a large dataset called ImageNet 27 for a classification task.Later, only the final layers are finetuned with the task-specific dataset.This approach was applied in previous studies on hip and pelvic fractures [22][23][24][25] .However, a recent study has demonstrated a more efficient three-step training scheme for transfer learning, which significantly reduced the labeled medical image requirements by 688-fold compared to the conventional twostep transfer learning, while maintaining similar performance 28 .In this proposed three-step training process, the deep learning model was first initialized with the ImageNet dataset 27 .Then, in the second step, the model was re-trained using a large chest X-ray (CXR) dataset to detect normal and abnormal cases.Finally, in the third step, the model was trained with a small dataset to detect a specific pulmonary disease.Another study utilized plain radiographs to train a deep learning model for detecting limbs, and then fine-tuned the model using PXR images for hip fracture detection 29 .The accuracy of hip fracture detection reached 91%.
A subset of deep learning, deep convolutional neural network (DCNN), has demonstrated remarkable performance across diverse applications including image classification 30,31 , object detection [32][33][34][35] , and video processing 36,37 .One of the key characteristics of DCNNs is their ability to recognize and extract features automatically, without human supervision 38,39 .This capability enables DCNNs to generate equivalent representations, facilitate sparse interactions, and implement parameter sharing 40 .As a result, different DCNNs have been used for the diagnosis and detection of various diseases 41 .Ibrahim et al. introduced a modified norm-VGG16 DCNN for the diagnosis of COVID-19 and its severity levels 42 .Inoue et al. utilized Faster-RCNN-Inception-V2-COCO DCNN to automatically detect fractures in whole-body trauma CT 43 .Ukai et al. used DCNN-based YOLOv3 to detect fractures in images extracted from multiple orientations of 3D-CT 44 .Cina et al. proposed a method that used several DCNNs for the localization of landmarks in spine radiographs 45 .
To address the lack of substantial amounts of annotated medical images to train a DCNN, this study introduces a novel two-step transfer learning approach based on digitally reconstructed radiograph (DRR).In the first step, a deep convolutional neural network (DCNN) is trained using different numbers of synthesized PXR images derived from 3D-CT by DRR.The second step involves fine-tuning the classification layers of the DCNN using acquired PXR images.Another contribution of this study is the performance evaluation of DCNN on different PXR datasets categorized based on fracture visibility.Furthermore, the performance of the proposed method is compared with the conventional ImageNet-based transfer learning method, and combinations of DRR-based method with ImageNet-based method.The proposed DRR-based method, using 20 synthesized PXR images for each CT, achieved AUROCs of 0.9327 and 0.8014 for visible and invisible fractures, respectively.The ImageNetbased method yielded AUROCs of 0.8908 and 0.7308 for visible and invisible fractures, respectively.

Result Distribution of PXR dataset
In the PXR dataset, there were primarily two classes of images: 'fracture' class, consisting of images with fractures, and 'normal' class, comprising images without any fractures.After excluding the PXR images with implants, and partial pelvic regions, the remaining PXR images with fractures were further categorized into three groups based on the visibility of fractures: PXROV, PXRIV, and PXRVIV.PXROV included PXR images with visible fractures, PXRIV included PXR images without visible fractures but with fractures identified in the corresponding 3D-CT,

DRR-based method
Applying DRR on a single 3D-CT image, numerous radiographic images can be synthesized.For this study, three DRR datasets, namely DRR10, DRR20, and DRR74, were synthesized by randomly rotating the 3D-CT.Each dataset consisted of 10, 20, and 74 synthesized images, respectively, corresponding to each 3D-CT.The 3D-CT of the subjects with fractures included in the XROV, XRVIV or XRIV dataset, as well as 3D-CT with implants, were excluded.After exclusions, a total of 349 3D-CT remained, out of which 152 had fractures and 197 was normal.DRR was applied only on the pelvic region of the 3D-CT.The DCNN was trained separately using the DRR10, DRR20, and DRR74 datasets, using fivefold cross-validation.The best model for each category were selected, and only the Fully-Connected (FC), SoftMax (SM) and Classification (CL) layers were fine-tuned using the PXROV dataset.The overview of the DRR-based method is shown in Fig. 2. The area under the receiver operating characteristic (AUROC) curves for PXROV diagnosis using models trained with DRR10, DRR20, and DRR74, were 0.9406, 0.9327, and 0.9211, respectively.The ROC curves of PXROV diagnosis by models trained with DRR10, DRR20, and DRR74 are shown in Fig. 3.
Additionally, for the models pre-trained with DRR10, DRR20, and DRR74, the F1 scores for PXROV were found to be 0.847, 0.895, and 0.860, respectively.Hence, DRR20 was chosen for additional analysis and  comparison.Grad-CAM was used to visualize the fracture region.Figure 4 shows some examples of Grad-CAM result overlaid on PXR images for visualization of relevant region.

Comparison between DRR-based and conventional method for detecting PXR image with visible fracture
In this step, we implemented four pre-training approaches: DRR20, ImageNet, ImageNet + DRR20, and Ima-geNet + DRR20_Full.The ImageNet approach involved training a DCNN model initially on the ImageNet dataset, followed by fine-tuning the FC, SM, and CL layers using PXR images.In the DRR20 approach, the DCNN model was trained using the DRR20 dataset, and then the FC, SM, and CL layers were fine-tuned with PXR images.For the ImageNet + DRR20 approach, we re-trained the DCNN model pre-trained on ImageNet with the DRR20 dataset, and subsequently fine-tuned the FC, SM, and CL layers with PXR images.Lastly, in the Ima-geNet + DRR20_Full approach, the DCNN model pre-trained on ImageNet was first re-trained with the DRR20 dataset, and then the entire DCNN model was fine-tuned using PXR images.

Discussion
In this study, our hypothesis was that pre-training a DCNN with synthesized images would enhance its performance in detecting PXR images with fractures.As DRR is a process of projecting 3D volume onto a 2D plane, the synthesized PXRs generated by random rotation contain unique anatomical variations.In contrast, conventional augmentation methods alter the locations of fractures or intensities without introducing any new anatomical variations.Hence, we proposed a DRR-based method, where the DCNN was pre-trained using synthesized PXR images generated from 3D-CT images by DRR.We also investigated the impact of the number of synthesized images on the DCNN's performance.We evaluated the AUROC for detecting PXR images with visible fractures and calculated F1 scores using a confidence score threshold of 0.5.Among 10, 20 and 74 synthesized PXR images from each 3D-CT, the AUROCs were similar for detecting PXR images with visible fractures (Fig. 3).The DCNN pre-trained with 20 synthesized PXR images achieved the highest F1 score.Next, we compared the performance of the DRR-based method with the conventional ImageNet-based transfer learning approach, as well as combinations of both methods (Fig. 5).The summary of the results has been shown in Table 2.When detecting PXR images with visible fractures using the PXROV dataset, the DRR20 method achieved the highest AUROC and F1 score of 0.9327 and 0.895, respectively.Similarly, for the detection of PXR images with visible fractures using PXRVIV and PXROVIV datasets for fine-tuning the DCNN, the DRR20 method also achieved the highest AUROC and F1 score.Hence, irrespective of variations in the finetuning data based on fracture visibility, the DRR20 method outperformed ImageNet-based method.Furthermore, we explored the combination of the DRR20-based and ImageNet-based methods through ImageNet + DRR20 and ImageNet + DRR20_Full approaches.Although the AUROC values for these combinations surpassed those obtained using the ImageNet-based method, they remained lower than the DRR20-based method in almost all cases.These findings demonstrate that pre-training the DCNN with a synthesized dataset designed to the desired task enhances the learning of relevant features.
During the synthesis of PXR images using DRR, fractures that were present in the 3D-CT data were sometimes obstructed due to rotations in the 3D plane.As a result, the synthesized PXR dataset contained images with visible fractures, images without visible fractures, and normal images without any fractures.We anticipated that the trained DCNNs would capture certain unique features associated with fractures that were not visible in the images.To test this assumption, we evaluated the performance of the trained DCNNs on the PXRIV dataset.The DRR-based method demonstrated promising results in this scenario as well.Regardless of the type of fine-tuning data, DRR20 achieved the highest AUROCs (Fig. 6) and F1 scores (Table 1).
Although the DRR-based method achieved the highest AUROC for detecting PXR images with visible and invisible fractures, the AUROC for detecting PXR images with invisible fractures was significantly lower.This observation was also valid for the ImageNet-based method.This trend was expected since the DCNNs were not trained with the PXRIV dataset.Figure 7 illustrates the comparison of AUROCs for different fine-tuning data.
From Fig. 7a, we can see that the decrease in AUROC from detecting PXR images with visible fractures to detecting PXR images with invisible fractures was higher for ImageNet compared to DRR20 when using PXROV as the fine-tuning data.However, from Fig. 7b and c, it can be seen that the decrease in AUROC became similar for ImageNet and DRR20 when fine-tuning the DCNNs with PXRVIV and PXROVVIV datasets.This suggests that the fine-tuning dataset, which included some PXR images with invisible fractures, improved the detection of PXR images with invisible fractures.Therefore, accurate annotation of data was crucial for enhancing the performance of the DCNN in detecting PXR images with invisible fractures.As DRR20 achieved the highest AUROC and F1 score for visible fracture (PXROV) diagnosis, we can conclude that DRR20 is the best method among DRR20, ImageNet, Imagenet + DRR20, and Imagenet + DRR20_Full.Furthermore, even though the DCNN was not optimized with PXRs that had only invisible fractures (PXRIV dataset), the DRR20 demonstrated promising AUROC and F1 score for detecting PXRs with invisible fractures.The reason for the better performance of the proposed method is that the DCNN was pre-trained using synthesized PXR images.As a result, the FC layers along with the Resnet101 backbone were specifically tuned for pelvic fracture diagnosis.In contrast, the ImageNet dataset was used to pre-train the Resnet101 backbone in the conventional transfer learning method, which doesn't contain the characteristics of pelvic fracture.Hence, this method can significantly contribute to the improved diagnosis of pelvic fractures, leading to a reduction in morbidity and mortality.However, the evaluation of pelvic fracture detection performance was limited to a single deep convolutional neural network (DCNN) with different pre-training schemes.Given the unique characteristics of pelvic fractures, it is important to further evaluate the method using various types of DCNNs before considering practical implementation.Additionally, it is important to note that it was a retrospective study, and the data were from a single institute, which introduces the possibility of population bias.Moreover, the selection of PXR images and 3D-CT scans was performed randomly, potentially including selective bias.Therefore, the interpretation of the findings may differ when applied to other institutes or populations.Consequently, it is crucial to validate the proposed method using larger and more diverse datasets to establish its usefulness in different hospital settings.

Figure 2 .
Figure 2. Overview of DRR-based method for PXR with fracture detection.

Table 1 .
AUROC and F1 scores of different DCNNs on PXRIV dataset.Significant values are in bold.

Table 2 .
AUROC and F1 scores of different DCNNs on PXROV dataset.Significant values are in bold.